90 research outputs found

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.Comment: To appear in Proc. SUM, 201

    Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.

    Get PDF
    Background: Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. Methods: Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. Results: Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. Conclusions: We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed

    Validation of the Cognitive Assessment of Later Life Status (CALLS) instrument: a computerized telephonic measure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Brief screening tests have been developed to measure cognitive performance and dementia, yet they measure limited cognitive domains and often lack construct validity. Neuropsychological assessments, while comprehensive, are too costly and time-consuming for epidemiological studies. This study's aim was to develop a psychometrically valid telephone administered test of cognitive function in aging.</p> <p>Methods</p> <p>Using a sequential hierarchical strategy, each stage of test development did not proceed until specified criteria were met. The 30 minute Cognitive Assessment of Later Life Status (CALLS) measure and a 2.5 hour in-person neuropsychological assessment were conducted with a randomly selected sample of 211 participants 65 years and older that included equivalent distributions of men and women from ethnically diverse populations.</p> <p>Results</p> <p>Overall Cronbach's coefficient alpha for the CALLS test was 0.81. A principal component analysis of the CALLS tests yielded five components. The CALLS total score was significantly correlated with four neuropsychological assessment components. Older age and having a high school education or less was significantly correlated with lower CALLS total scores. Females scored better overall than males. There were no score differences based on race.</p> <p>Conclusion</p> <p>The CALLS test is a valid measure that provides a unique opportunity to reliably and efficiently study cognitive function in large populations.</p

    An AP-MS- and BioID-compatible MAC-tag enables comprehensive mapping of protein interactions and subcellular localizations

    Get PDF
    Protein-protein interactions govern almost all cellular functions. These complex networks of stable and transient associations can be mapped by affinity purification mass spectrometry (AP-MS) and complementary proximity-based labeling methods such as BioID. To exploit the advantages of both strategies, we here design and optimize an integrated approach combining AP-MS and BioID in a single construct, which we term MAC-tag. We systematically apply the MAC-tag approach to 18 subcellular and 3 sub-organelle localization markers, generating a molecular context database, which can be used to define a protein's molecular location. In addition, we show that combining the AP-MS and BioID results makes it possible to obtain interaction distances within a protein complex. Taken together, our integrated strategy enables the comprehensive mapping of the physical and functional interactions of proteins, defining their molecular context and improving our understanding of the cellular interactome.Peer reviewe

    A practical guide to photoacoustic tomography in the life sciences

    Get PDF
    The life sciences can benefit greatly from imaging technologies that connect microscopic discoveries with macroscopic observations. One technology uniquely positioned to provide such benefits is photoacoustic tomography (PAT), a sensitive modality for imaging optical absorption contrast over a range of spatial scales at high speed. In PAT, endogenous contrast reveals a tissue's anatomical, functional, metabolic, and histologic properties, and exogenous contrast provides molecular and cellular specificity. The spatial scale of PAT covers organelles, cells, tissues, organs, and small animals. Consequently, PAT is complementary to other imaging modalities in contrast mechanism, penetration, spatial resolution, and temporal resolution. We review the fundamentals of PAT and provide practical guidelines for matching PAT systems with research needs. We also summarize the most promising biomedical applications of PAT, discuss related challenges, and envision PAT's potential to lead to further breakthroughs
    • …
    corecore